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Abstract —The entropy computation of Gaussian mixture dis¬ 
tributions with a large number of components has a prohibitive 
computational complexity. In this paper, we propose a novel 
approach exploiting the sphere decoding concept to bound 
and approximate such entropy terms with reduced complexity 
and good accuracy. Moreover, we propose an SNR region- 
based enhancement of the approximation method to reduce 
the complexity even further. Using Monte-Carlo simulations, 
the proposed methods are numerically demonstrated for the 
computation of the mutual information including the entropy 
term of various channels with finite constellation modulations 
such as binary and quadratic amplitude modulation (QAM) 
inputs for communication applications. 

Index Terms —Gaussian mixture distribution. Entropy ap¬ 
proximation, Mutual information. Finite input alphabet. Sphere 
decoding 

I. Introduction 

I N general, the computation of Gaussian mixture distribu¬ 
tions with a large number of components has a prohibitive 
computational complexity but a wide range of useful applica¬ 
tion areas including communications m-a, data fusion a- 
ISl . machine learning ||9|, ifTOll . image and pattern recognition 
ifm . 1121, and target tracking applications ifTSll . ifTTl . For 
instance, the computation of mutual information in communi¬ 
cations results in the problem of computing entropy terms of a 
large system with hnite input alphabet which has a prohibitive 
computational complexity since the number of possible inputs 
grows exponentially with the system dimension. Moreover, in 
data fusion and target tracking applications, computing the 
full Gaussian mixture distribution of a sampled data set has 
prohibitive complexity for high dimensions or a large data set. 

In data fusion and tracking areas, Gaussian mixture reduc¬ 
tion is common to reduce the problem size and bound the com¬ 
putational complexity and required memory size Ei-ii, m, 
M- However, most Gaussian mixture reduction algorithms 
know the true Gaussian mixture distribution for a sampled data 
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set and start from it to reduce the number of components by 
merging, pursing, and expanding based on distance measures 
such as integral squared error (ISE) and Kullback-Leibler (KL) 
divergence. However, they are intractable for high dimensions 
since this approach requires the computation of the distance 
measures among all possible components. In this paper, we 
propose a different approximation approach but in principle, 
it is also a Gaussian mixture reduction. 

On the other hand, there have been several approaches in 
communications to approximate the mutual information or the 
entropy of Gaussian mixture distributions both analytically 
and numerically. Huber et al. 12 proposed an entropy ap¬ 
proximation of Gaussian mixture random vectors based on 
Taylor series expansion, which does not apply to a large 
system size. Girnyk et al. m analyzed the capacity of a large 
multiple input and multiple output (MIMO) system with a 
hnite input alphabet based on the matrix replica method. This 
approach is only applicable to compute the average capacity 
of an independent and identically distributed (i.i.d.) MIMO 
channel with inhnite dimension. Arnold et al. 11 proposed 
a simulation-based computation of the mutual information of 
a time-invariant discrete-time channel with memory. Dauwels 
and Loeliger ifTsll extended the approach to continuous state 
spaces and Molkaraie and Loeliger IThll applied it to infor¬ 
mation rates computation of two-dimensional channels whose 
main application is a magnetic recording. Although this allows 
the approximation of the mutual information with a long block 
length, the method is limited to time-invariant frequency- 
selective fading channels with a relatively short hnite impulse 
response (FIR) length. Zhu et al. l3l proposed a statistical 
computation approach for MIMO channels with a hnite alpha¬ 
bet depending on the signal-to-noise ratio (SNR). Even if this 
approach offers very low complexity for arbitrarily structured 
channels with high dimension, the accuracy at moderate SNR, 
especially important for practical systems, is not acceptable. 

In this paper, our main contribution is to provide a novel 
approximation method with low complexity and good accuracy 
on the mutual information of arbitrarily structured channels 
with high dimension, which also leads to new upper and 
lower bounds. The main idea is to hnd A^-closest Gaussian 
components through an efficient tree search algorithm and 
approximate the true Gaussian mixture distribution by a re¬ 
duced Gaussian mixture distribution. Based on this approach, 
we provide upper and lower bounds computable with reduced 
complexity and, further, an approximation with signihcantly 
reduced complexity, which can be computed even for high 
dimensional cases. Although we focus on the communication 
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problems in this paper, it is worth mentioning that the proposed 
method has many general applications where a reduction of 
the Gaussian mixture is needed. 

The rest of this paper is organized as follows. In Section HU 
the problem definition including a basic system model is 
presented. In Section [Till we review the sphere decoding tree 
search algorithm. Novel sphere decoder approximations on the 
entropy are provided in Section |IV] In Section [V] an SNR- 
based enhanced approximation algorithm suitable for high 
dimension is proposed. In Section [Vl] several numerical ex¬ 
amples are discussed for various channels. Finally, conclusive 
remarks are provided in Section IVIII 

II. Problem Definition 

The Gaussian mixture distribution is a weighted sum of 
Gaussian distributions with different mean and/or variance, 
which is mathematically modeled as 

ff(x) = (1) 

i=l 

where x denotes the complex-valued input vector, Ng denotes 
the total number of Gaussian components, uii denotes the non¬ 
negative weight factor for the i-th Gaussian component with 
= 1, and (?i(x) denotes the i-th Gaussian component 
following a complex Gaussian distribution with mean and 
covariance i.e., gi(x) ^ CAf For Ng large, the 
computation of p(x) has a high complexity and therefore 
reducing the number of components is the main approach of 
previous Gaussian mixture reduction problem. 

In this paper, we consider the following basic system 
equation, which is common for many communication systems. 

z = Hd -f n, (2) 

where z € denotes the received signal vector, d G 

denotes the input symbol vector where each sym¬ 
bol dk is taken from a finite constellation set Ale C C, 
H G C^* ^ denotes an arbitrarily structured channel matrix, 
n denotes the additive white Gaussian noise vector, n ~ 
CA/^(0,I), and the transmitted power (equivalently, SNR due 
to normalized unit noise variance) is given p = E[d*^d]. Then, 
the mutual information between the input d and the output z 
in (|2|i can be expressed by the differential entropies as follows; 

/(z; d) = /i(z) — /i(z|d) = /i(z) — h(n) 

= -E[log2(/z(z))] - log2 (det(7rel)), (3) 

where /^(z E denotes the probability density function (pdf) of 
z, which is a Gaussian mixture distribution given by 

/z(z) = F(d^)/z|d(z;|d*), ( 4 ) 

where Me denotes the number of constellation points and d^ 
denotes the i-th input symbol vector among possibilities. 
For practical communication problems the components of 
di are usually assumed to be independent and uniformly 

*We drop the subindex when it is clear from the context. 


distributed (i.u.d), i.e., p{di) = M~^*. Note that for large Nt, 
the computation of (|4]i is infeasible due to the exponentially 
increasing number of input vectors. Since the computation 
of the expectation in Q can be easily handled by Monte- 
Carlo simulation, the problem at hand is to approximate (|4]i. 
In general, for a given z, only a few terms in the sum in 
© hav a significant contribution. Therefore, finding those 
components which highly contribute is our main approach for 
the approximation in the rest of this paper. 


III. A Review of Sphere Decoding Tree Search 
Our proposed bounds and approximation presented in next 
sections are inspired from the sphere decoding (SD) algorithm 
Ei-iiia, which is a well-known maximum likelihood (ML) 
branch and bound algorithm in a tree search for MIMO 
detection, i.e., finding the most likely input vector d^ in given 
the received vector z, and the soft SD algorithm E6\ which 
principle can be used for capacity approximation as shown in 
the following. The motivation is that it can reduce the search 
space and, thus, the required computations via an efficient tree 
search. Here, we briefly review the SD algorithm. 

In order to construct a search tree, the SD algorithm first 
performs QR factorization of the channel matrix H. Then, the 
system equation (|2]i is equivalently given by 

V = Rd 4- w, (5) 


with H = QR in which Q is a unitary matrix and R is an 
upper triangular matrix, v = Q'^z, w = Q'^n ~ CA/^(0,1), 
and d = [di,..., It is worth noting that since any 

invertible linear operation does not change the mutual informa¬ 
tion ||271 . /(z; d) = /(v; d). Then, a search tree is constructed 
from the bottom to the top of the equivalent upper-triangular 
channel matrix R. That is, first branches from the root node 
are constructed from the last diagonal term of R corresponding 
to djVt until the last branches to the leaf nodes are constructed 
from the first row of R corresponding to di. Let rij denote 
the (i,d)-th element of R. Then, at the fc-th depth, the cost 
value corresponding to the Euclidean distance between the 
received vector v and the considered input d can be recursively 
expressed as 


c(/c, d^*_j,_^i) — c(/c l,d^*_j,^ 2 ) 


Nt 


Nt 


VNt-k+1 — ^ '''kjdj 

j=Nt-k+2 


(6) 


where k G AJ, c(0,d^‘_^;^) = 0, d^ = 

[di,dj+i, ... ,dj]^, and V = [ui, ..., Fig. [T] illustrates 

an example of SD search tree construction for case of 4- 
quadratic amplitude modulation (QAM) and Nt = 3 resulting 
in 4^ = 64 possibilities. 


A. Depth-First Search (DFS) 

The DFS algorithm searches for components with the 
distance less than the sphere radius in both forward and 
backward directions among the sub-trees. It first goes through 
the search tree by a leaf node in the forward direction of 
k = 1,2, ■ ■ ■, Nt and then it moves backward in the direction 
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Fig. 1. An example of SD search tree (e.g., 4-QAM and Nt = 3). 




Fig. 2. Examples of (a) DFS and (b) BFS (e.g., binary input and Nt = 3). 
The gray arrows denote the search movements. The black/white circle denotes 
the visited/non-visited node. The dashed line denotes the pruned branch. 


of Nt, Nt — 1,. ■. ,1. Fig. |2] (a) illustrates an example of the 
DFS. 

The DFS algorithm efficiently provides the optimal ML 
solution corresponding to the closest input symbol vector for 
traditional MIMO detection. Moreover, during the tree search, 
if it finds an input symbol vector with shorter distance than the 
sphere radius, the sphere radius can be dynamically updated 
which reduces the tree search complexity for the purpose of 
finding only the closest component. However, in this paper, our 
purpose of the tree search is finding all components within a 
given sphere radius. Therefore, we use a fixed sphere radius 
and do not consider its dynamic update. As a result, after the 
tree search, it is guaranteed to find all input symbol vectors 
with shorter distance than the sphere radius. Denoting the 
number of components within the sphere radius as TV, the 
TV-closest component^ can be found during the tree search. 

B. Breadth-First Search (BFS) 

The BFS algorithm searches for components in the forward 
direction only. That is, it searches all nodes at a certain depth 
and then moves to the next depth. Fig. |2] (b) illustrates an 
example of the BFS. 

In most applications of MIMO detection, the BFS algorithm 
keeps just TV-best components and prune the other branches 

^We can also fix the number of components N and update the sphere 
radius as often as N components are found. Then, we have N candidates 
found during the tree search. 


at each depth. This is called TT-Best SD algorithm ll24l . 12^ . 
E9i . In this case, if K is sufficiently large, the solution 
approaches the optimal ML solution. In contrast, limiting K 
reduces the search complexity and thus it provides a fixed 
search complexity. This is the main advantage of the iV-best 
SD algorithm since it is easily implemented in a parallel and 
a pipelined fashion. In the viewpoint of finding TV-closest 
components in our problem, this approach also can provide the 
fixed complexity relying on K even though the components 
found at the end are not guaranteed to be the TV-closest 
components. 

IV. Sphere Decoder Approximation 

In this section, we exploit the SD algorithm in a different 
manner in order to find approximations and bounds on the 
entropy of Gaussian mixture distributions. While the aim of 
original SD algorithm is to find only the closest input vector, 
we find the TV-closest input vectors, which contribute the 
most to ,/(z), through an efficient tree search. We propose 
two approaches employing both the DFS and the BFS. The 
two approaches give different accuracy and complexity control 
methods although the basic principle is the same. The follow¬ 
ing bounds are the approximation. From the simulations, we 
see that the upper bound is usually close to the true curve 
(refer to Fig. 0 (a). Fig. |2l and Fig. [8] (a)). 

A. DFS-Based Upper and Lower Bounds 

Starting from (|5]), the DFS-based algorithm finds input 
symbol vectors satisfying 

||v-Rdf<C^ (7) 

where the sphere radius is set to 

C2=a||v-Rdof, (8) 

where dp denotes the Babai estimat^ lf30l and a denotes 
a control parameter which can be used to adjust complexity 
versus accuracy. If we increase a, the accuracy increases since 
the search result can include more components due to the 
larger search radius, while the complexity also increases since 
it requires more searches in the tree. It gives the full tree search 
when a —>■ oo, i.e., the true distribution. Note that if a > 1, 
the sphere radius ® guarantees to find at least one component 
in the tree search because it includes at least do. After the SD 
tree search, the following set of ordered symbol vectors are 
found; 

“ {^1’ ; d^(C) }; (9) 

■'^DFS 

where C V = jVl = TV^pg =^|T)[^j^g|, 

and ||v - Rdilp < ||v - Rd 2 |p < ... < ||v - Rd«:) |p. 

-''^DFS 

Assuming i.u.d. input d, the true pdf /(z) can be expressed 

^Equivalently, it is the zero-forcing (ZF) point found as do = H^z where 

Ht = (HHH)-IhH. 
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as: 

/z(z) = p(‘l)/z|d(z|d) = ^^(d)/v|d(v|d) 

dev dev 

= (10) 
° dev 

where the second equality is obtained from the fact that 
IIV — Rd|p = ||z ~ Hd|p due to unitary Q. Therefore, /z(z) 
is equal to /v(v). Accordingly, we have h{z) = h{v) and 
/(z;d) = /(v;d). Let T = 

where -D(x) = ||v —Rx|p. For the ordered input symbol vec¬ 
tors with respect to the distance, i.e., 1) = {di, d 2 ,..., dATt}, 
the following relations hold after the SD tree search: 

exp{-D{di)) >■■■> exp{-D{d io )) > exp(-C^) 

^'^DFS 

> exp{-D{d (o )) > ••• > exp{-D{dN,)). (11) 

Thus, T can be expressed in two parts: 

T= ^ exp{-D{d))-\- ^ exip(-D{d)). (12) 

deX>^pg deV\V^pQ 

components found components pruned 


In more detail, denote the remaining Euclidean distance 
values at leaf nodes for each sub-branch by c(A^(,d^*“^) = 
c(lVt,df*) - c(fc, d^;_j._^J > 0 where d;^ = [di,... ,dj]^. 
Since for the pruned branch, < c(A:, djY*_^_|_^) < 

c(lVt,df*) = c(fc, d^*_j._^ J -f c(lVt,df*“''), replacing 
by c{k,d^[_^_^_.^) for all the pruned branches yields a better 
lower bound on the entropy. 

Let us define /dfs(’'^) t>y 

7dfs(v) = /dfs(^) + E exp(-g(d)), 

dev\v[<ls 

(17) 

where c(ci) denotes the cost value of d at its own pruned depth. 
For instance, if d is pruned at depth k, c(d) = c{k, 

Then, the differential entropy of z gets the enhanced lower 
bound as 


^DFS < ^DFS ^ h{z). 


(18) 


where /i^ps = —E log 2 /DFs('^) ■ Substituting the entropy 
bounds into Q results in bounds as follows: 


-^KFS < -^DFS ^ — -^DFS- 


(19) 


The second term for pruned components is upper-bounded 
exp(—Therefore, T can be bounded as 

follows 

E exp(-llv-Rdf) < T 

^^^DFS 

< E exp(-||v-Rdf ^-f (|T>| - exp(-C^) . 

uti-'DFS 

(13) 


Let us define /opg(v) and /dfs(v) by 

de'D^pg 

7dfs(v) = /dps(v) + exp (-f) . (15) 

Then, the differential entropy of z is bounded by 


^DFS < — ^DFS 


(16) 


where fi^FS = 
[l0g2/DpsW 

for all V = Q^z. 


-E [log2/DFs(v)] and fi“^pg = 
< 7dfs(v) 


Enhanced Lower Bound: During the tree search, a pruned 
branch including sub-branches has a distance value greater 
than f. Let the cost value of the pruned branch at the k-th 
depth of the search tree be denoted by c(fc, d^‘_j,g^j^) where 
djvffc+i = [dNt-k+iT ■ ■ ■:dNtV is the input symbol vector 
with length k found in previous and current depth searches. 
Then, the pruned branch includes sub-branches and 

the symbol vectors corresponding to the sub-branches can use 
c{k,d^[_f.^^) instead of f for the exp(—f) term in (fTSl) . 


B. BFS-Based Upper and Lower Bounds 

For BFS-based upper and lower bounds, we employ BFS- 
based iT-best SD approach. Similarly to the DFS-based al¬ 
gorithm, the BFS-based algorithm finds input symbol vectors 
satisfying 

||v-Rdf <f, 

but f is set to a sufficiently large value so that all components 
are included within the sphere radius. Differently from the 
DFS-based algorithm, the BFS-based algorithm finds the K- 
closest components at each depth (i.e., each breadth). In more 
detail, it takes K shortest distance components among M^K 
components at each k-th depth. Note that when < K, 
all components are taken at the depth. After all, K 
becomes a control parameter in the BFS-based algorithm to 
adjust complexity versus accuracy instead of the a parameter 
in the DFS-based algorithm. Note that if Ff > all 

the components are found at the end of the tree search in the 
BFS-based algorithm. 

After the SD tree search, the following set of ordered 
symbol vectors are found: 

={di,d2,...,d^(K)}, (20) 

where Dgpg^C V = 77gpg,^ \V\ = Mf*, iVppg = l^^gpgl, 
and ||v - Rdif < ||v - Rd 2 f < ... < ||v - Rd^^ f. 

In the BFS-based algorithm, the corresponding relation to 
(fTTI) does not hold since the components found are not exactly 
the iV-closest components anymore. However, (fTSl) can be still 
equivalently expressed as 

T= E exp(-D(d))-t- ^ exp(-D(d)). (21) 

'- , -^ ^ 

components found components pruned 
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Thus, T is lower-bounded by the first term of the right-hand 
side of (ISTT i. Although we cannot find an upper bound as in 
(fTSI l, the enhanced lower bound approach on the entropy still 
works in this case. 

Let us define /Bpg(v) and 7bfs(v) by 

7bfs(v) = /bfs(v) + E exp(-5(d)), 

(23) 

where c(d) denotes the cost value of d at its own pruned 
depth. Then, the differential entropy of z is bounded by 

^BFS ^ — ^BFS’ 


where figjjg 


= -E 


log2 /bFs(’'^) 


and 


^BFS 


-E [log 2 /gpg(v)J since /gpgCv) < /(z) < /bfs(v)- 
Substituting the entropy bounds into (|3ll results in bounds as 
follows: 


4°p+S </(z;d)</“Pg. (25) 

Determination of the K Parameter: The BFS-based bounds 
algorithm enables the complexit}0 to be fixed as a certain 
value by adjusting K parameter, while the DFS-based bounds 
algorithm can implicitly control the complexity according to 
a parameter. Define fco — maxjfc : < AT}. Then, the 

complexity of the bounds based on the BFS algorithm in terms 
of the number of visited nodes in the tree search is given by 


fco Nt 

C{K) = Y,M^^+ Y. m,k 

k=l fe=feo-|-l 

Me(l - M^°) 


1-M, 


+ {Nt - ko) McK. (26) 


Note that for AT —>• oo, we have C(oo) = Yk=i^^c = 
which is the complexity of the true Gaussian 
mixture distribution. Finally, for a given complexity Co, the 
AT parameter is determined by 


K{Ctt) 


1 

1 

O 

1 — 1 

1 

o 

[Nt - ko 

[Me 

1 ^ 

1 


(27) 


Table U illustrates the notations used in algorithm descrip¬ 
tions in the following. The overall procedure of the proposed 
SD approximation algorithm is specified in Algorithm [T] 
The DFS-based and BFS-based SD tree search algorithms 
used in Algorithm [T] are described as recursive functions in 
Algorithm 12 and Algorithm |2 respectively. 


Algorithm 1: Sphere Decoder Approximation 

Input: H,p 

Output: ^seu^seb^sd" 

1 [Q R] qr(H) // QR factorization 

// Integration by a Monte-Carlo method 

2 for i = 1 to Nd do 

3 

4 

5 

6 
7 


9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


// Loop for d 

// Loop for n 
CAr(0,I) 


Generate ^ ^ ■ s where s 
for j = 1 to Nn do 

Generate n('^7 where 
z(bj) ^ Hd(*) -b n(j) 

// Babai estimate 

d^ ^ 

// Call a tree search algorithm 

if DFS then 

Set a > 1 and ^ a||v(*7) _ Rd^ ||2 
[Dsd,C] ^DFS{{v(*>J),R,C tllJ],0,0,0}) 


else if BFS then 

Set AT according to dZTl) 

[Dsd,C] ^ BFS {{vO-J),R,iT},{l,0,0,0}) 

// Compute pdfs 

^ exp(-||v(-^) - Hdf) 


// Compute entropy bounds 


h^p. _ 

hio , _3_ 

'^SD ^ 

M°+ i _L_ 

'^SD ^ ATrfAT, 


‘P 


Algorithm 2: DFS-Based SD Tree Search 


Function DFS ({v, R, (}^}, {fc, d, c, C, D^pg}) 

Store d' ^ d and c' ^ c 
for m ^ 1 to Me do 

d ^ [dm] d'] where dm ^ M{m) 

Compute the cost value c according to (|6ll 
if c < then / / Valid: Searching 

if fc = then // Leaf node 

L Eis Eis U{d} 

// Intermediate node 
Go to next depth 

(0 


10 


11 


12 


else 


// 

DFS ({v,R, C^},{fe-b l,d,c,£:,D})'^g}) 


else // Invalid: Pruning 

// Update the exponential term 
for enhanced lower bound 
£ ^ £ + exp(—c) • 


^Throughout this paper, the complexity is evaluated in terms of the number j3 return , £ 

of visited nodes in a tree search, which is common in the literature on the _ 

sphere decoding algorithms HI], (23) 
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(a) 


(b) 


(c) 


Fig. 3. An example of three approximations according to SNR region (e.g., 4-PAM): (a) low SNR - single Gaussian approximation, (b) medium 

SNR - 2-closest components approximation based on the SD tree search, /®(z). (c) high SNR - Bahai estimate-based approximation, /^(z). The red 
dashed-dotted lines denote the pdfs of four different Gaussian components, /(z, di) = p(di)/(z|di), the black line denotes the pdf of the true Gaussian 
mixture, /(z), and the blue dashed line with '-I-’ marker denotes the approximated pdf. The green circle denotes the drawn z in Monte Carlo method, for 
which /(z) has to be approximated. 


TABLE I 

Notations used in algorithms 


Notation 

Description 


Number of iterations for generating d 

Nr, 

Number of iterations for generating n 

W(A4"‘) 

Uniform distribution on the A^t-dimension Cartesian 
product of the constellation points set AI 

hi 

Monte-Carlo integration approximation for the entropy 


V. SNR-Based Algorithmic Extension 

The complexity of the previous algorithms may be still 
too high for a large number of components. In the following 
subsection, we propose another approach to further reduce the 
complexity significantly. For a given complexity, the approach 
can be also used to improve the precision by increasing the 
number of considered components in the range what it matters. 

The main idea of the extension is to apply different approxi¬ 
mation methods to partial symbol vectors within different SNR 
regions and combine them in order to compute the entropy 
in the mutual information. To this end, we first partition 
the given channel matrix and input symbol vector to three 
regions with respect to the SNR: (i) low SNR, (ii) medium 
SNR, and (iii) high SNR. Thereafter, we apply one component 
only approximation, the SD upper bound, and single Gaussian 
approximation, respectively. Finally, we combine them over 
the unified symbol vector. Fig. |3] illustrates a simple 4- 
pulse amplitude modulation (PAM) example of three different 
approximation methods suitable for different SNR. In the 
figure, each approximated pdf is well-matched with the true 
Gaussian mixture pdf with respect to the drawn z in Monte 
Carlo method. This is the main motivation of this SNR region 
based approximation in this section. 

According to the above partitioning, the received signal 
model (|5]l can be rewritten as 


VA 


A 


Ca ' 


(Ia 


WA 

Vb 

= 

0 

B 

Cb 


ds 

+ 

wb 

VC . 


0 

0 

c 


. ‘Id . 




Algorithm 3: BFS-Based SD Tree Search 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Function BFS {{^,11, K},{k,D,C,£}) 

Set Dcand ^ 0 and Ccand ^ 0 
K' 11 For K > 

for 7 = 1 to K' do 

d' •<— D{i) and c' •<— C{i) // i-th element 

for TO = 1 to Me do 

d 1- [dm ; d'] where dm ^ Ml (to) 

Compute the cost value c according to (|6]l 
2^cand 2^cand Ul^} 
l^cand l^cand 


// Sort based on the cost values 

11 [^sort 5 f^sort] ^ SOrt (Dcandi f^cand ) 

12 if fc = Nt then // Leaf node 


13 


[d: 


(K) 

BFS 




14 

15 

16 

17 

18 
19 


else // Intermediate node 

// Take the K-best elements 

^ // For K > 

and C^{C 

sort} 1 

// Update the exponential term 

+ FceCsortVC exp(-c) • 

// Go to next depth 
_ BFS {{■v,K,K},{k+l,V,C,£}) 

return ^ 


where A G C^axNa^ b g C^bxNb^ ^nd C G C^oxNc 
which Nt = Na + Nb + Nc- Fet diag(R) = [Ai,..., AatJ. 
Assuming the diagonal terms in R are ordered in increasing 
order, the following relations hold with respect to two thresh¬ 
old values, 7 i and yh- 

Ai < ... < < '^Aa-i-1 < • ■ • ^ ^Na+Nb 

'-V-" '-V-' 

low SNR medium SNR 

< lh < A^^+jvb-i-1 ^ ^ • (29) 

high SNR 


(28) 
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Consequently, A, B, and C in (l28T l correspond to low SNR, 
medium SNR, and high SNR partitions, respectively, after 
reordering the original channel matrix, i.e., H = HII where 
n is the permutation matrix, such that the eigenvalues are 
sorted in increasing order. The V-BLAST ZF-DFE channel 
ordering in ifT^ provides an eigenvalue ordering method. Note 
that the sorting may not be perfect but it is sufficiently good 
for our purpose since the differences are small. Similarly to 
a and K parameters, ji and 7 /j are design parameters which 
trade off accuracy versus complexity. At medium SNR, both 
parameters need to be carefully chosen since they can still 
cause a prohibitive computational complexity. A discussion 
on the choice of those parameters is provided in Section IV-BI 


For each of the [-closest vectors, € vIp, we have 

= Ad^ + B^ds + WA. (35) 

Hence, for given d^, we arrive at 

- B^ds = Ad^ + w^, (36) 

which follows a Gaussian mixture distribution similar to ( l32l i. 
For each given dB,m, we approximate the Gaussian mixture 
distribution /(v^ by a single Gaussian distribution with 
same mean and covariance for the low SNR block A as shown 
in Fig. |3] (a). 

Applying the three different approximations to the three 
SNR partition, the pdf of the unified received symbol vector 
can be derived as 


A. SNR-Based Enhanced Approximation 

In this subsection, we propose an SNR-based extension 
of the SD approximation method. Therefore, we first present 
three approximation methods for three difference SNR parti¬ 
tion. Then, we provide the approximated pdf combining those 
results. 

We start from the high SNR partition corresponding to the 
block C. The effective received signal at high SNR can be 
approximated by 

vc = Cdc + wc Ri Cdc + wc, (30) 

where dc is the drawn dc in the Monte-Carlo method, thus 
it is known to us for the computation. At high SNR, this 
approximation becomes very good due to negligible noise as 
shown in Fig. |3 (c). 

By applying the known component for the high SNR block, 
the effective received signal at medium SNR is approximated 
by 

Vs = Bds -b C^dc + wb 

~ Bds + Csdc + wb. (31) 

For given dc, we have 

Vb = vb - Csdc « Bds + wb. (32) 

Similarly as in the previous sections, we apply either the DFS- 
based tree search or the BFS-based tree search to (l32l i instead 
of®. For the DFS-based tree search, the sphere radius is set 
to p = a\\v'g - Bdo,B|P where do,B is the Babai estimate 
corresponding to dB. For the BFS-based tree search, is set 
to a sufficiently large value and the K parameter is chosen 
considering the block size Nb- Afterwards, we can find the 
vector set = {dB,i, dB,2, ■ ■ ■, dB where either 

= T’qps if the DFS-based tree search is used or = 
if the BFS-based tree search is used. 

Similarly to the medium SNR case, by applying the Babai 
estimate for the high SNR block, the effective received signal 
at low SNR is given by 

VA = Adyi + B^dB + CaAc + 

ss Ad,4 + B.4dB + C^dc + (33) 

For given dc, we have 

= VA - CAdc ~ AdA -b BAdB -b WA. (34) 


/(v) = /(vc,vb,va) = /(vc)/(vb,va|vc) 

= X! ^'(dc)/(vc|dc)/(vB,VA|vc,dc) 
dcGX>c 

(a) , _ 

> F(dc)/(vc|dc)/(vB,VA|vc,dc) 

W p(dc)/(vc|dc)/(vB,VA|dc) 

= p(dc)/(vc|dc)/(vB|dc)/(vA|vB, dc) 

= F(dc)/(vc|dc)- 

X] P(dB)/(vB|dc,dB)/(vA|vB,dc,dB) 

> b'(dc)/(vc|dc)- 

[ X] F(dB)/(vB|dc,dB)/(vA|vB,dc,dB) 

dBGX-|° 


~ p(dc)/(vc|dc)- 

X F(dB)/(vB|dc,dB)/(vA|dc,dB) 

dBGX.|° 


= F(dc)/(vc|dc) 


X b'(dB)/(vB|dc,dB)- 

dBGX-|° 


X F(dA)/(vA|dc,dB,dA) 

dA 


y p(dc)/(vc|dc)- 

X P(dB)/(vB|dc,dB)/G(vA|dc,dB) , 

dBGX.|° 

(37) 


where (a) is the single component-based approximation, (c) is 
the SD upper bound, (e) is the single Gaussian approximation, 
and (b) and (d) follow from (l3Tl i and ( l33t . In dJTl l. each term 
is given by p(dc) = p{dB) = 

/(vc|dc) = ;^exp(-||vc - Mcf) : (38) 

/(vB|dc,dB) = ;^exp(-||vB - MbT) ) (39) 

/G(vA|dc,dB) 

= det Ka ~ - fJ-A)) , (40) 
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Algorithm 4: SNR-Based Enhanced Approximation 
Input: H,/9 
Output: /iSDEA 

1 Initialization: Set 7 / and 7 ^ 

2 H = Hn according to lfT9ll // Channel ordering 

3 [Q R] qr(H) // QR factorization 

// Channel matrix partition 

4 Find A, B, C, C^, and according to (|29] t 

// Integration by a Monte-Carlo method 

5 for z = 1 to Nd do // Loop for d 

Generate • s where s ~ 

for j = 1 to Nn do // Loop for n 

Generate where ~ CA/*(0,1) 
z(m) ^ Hd^^) 


6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 


22 


23 


24 


Find d^;^ d, 


c 


and V, 


c 


II Babai estimate 
do and find do, a, do,B, and do,c 

// (C) High SNR approximation 

did) I 


Compute /(vp Ido c”) according to 
// (B) Medium SNR approximation 

./(i.i) , ,.(id) pi j(i) 

// Call SD tree search algorithm 

if DFS then 

Set a > 1 and ^ a||vi’'^^ — Rd[j^LP 
[P|D,£■] ^ DFS {R, Ctll JL 0,0,0}) 
else if BFS then 

Set K according to dZTl i 
[VjP,£] ^ BFS {{vy^iR,iF},{l,0,0,0}) 

Compute /(v^’-^^|do_c,dB), Vds € 
according to ( |39] | 

// (A) Low SNR approximation 

Compute /(vi-^^|do,c,dB), VcIb G 
according to (l40l i 
// Compute the pdf of v 
Compute according to dJTl) 

// Compute entropy approximation 


^SDEA = — 


NdNr, 




where = Cdp, = BdB + CBdc, 

fjbj^ = B^dB + Ca^c, and = pAA'^ + I. Note 
that this novel approximation can reduce the tree search 
complexity from to where Nb is determined 

by both 7 i and 7 ^ parameters. Since (a) and (c) give 
lower bounds, when (b), (d), and (e) are very accurate 
approximations, the final pdf in dJTl i can be a lower bound 
(equivalently, an upper bound on the entropy). However, 
in general, it is an approximation due to (b), (d), and (e). 
The overall SNR-based enhanced approximation algorithm is 
presented in Algorithm |4] 



Fig. 4. The relation among the GB, the SEB, and the true mutual information 
according to SNR. pc denotes the SNR corresponding to the intersection of 
the GB and the SEB. 

B. Discussion on 7 / and 7 ^ Parameters 

Since 7; and jh parameters determine the size of the 
submatrix B, they highly influence the complexity reduction 
gain. Basically, if the difference between those parameters 
is small, the proposed approximation yields low complexity 
with some accuracy losses. On the contrary, as the difference 
increases, it converges to the SD upper bound results. The 
goal is to set the parameters so that the accuracy losses 
are still acceptable. In this subsection, we investigate trends 
of accuracy on the mutual information according to those 
parameters which will provide us with a guideline how to 
determine them. 

Even though we focus on the entropy approximation, our 
main results are evaluated in terms of the mutual information 
in Section Thus, we determine the parameters based on 
the mutual information in this subsection. Eirst of all, there 
exist two trivial upper bounds on the mutual information: (i) 
Gaussian bound (GB) assuming Gaussian input distribution 
given by /(z; d) = log 2 det (I 4 - (ii) source entropy 

bound (SEB) such that the mutual information cannot exceed 
the source entropy, i.e., /(z;d) = H[d) — R(d|z) < H{d) = 
log 2 since the entropy is non-negative. Eig. 0] shows a 
typical relation among the GB, the SEB, and the true mutual 
information according to SNR. 

Let /gg, /ggg, /gg, and /sdea denote GB, SEB, SD upper 
bound, and SD-based enhanced approximation on the mutual 
information, respectively. Through numerical observations, the 
basic trends of the mutual information according to 7; and 
7/1 parameters for given SNR are illustrated in Eig. |5] In the 
figures, we draw two mutual information curves fixing one 
threshold and varying the other: The thick blue dashed curve 
is for fixing 7 ; —>■ 0 (equivalently, 7 / < Af) and varying 7 ^ 
{Na — 0 case, so called ‘BC curve’); The thick red dot-and- 
dash curve is for fixing jh ^ 00 (equivalently, 7 ^ > A^^) and 
varying 7 / {Nq = 0 case, so called ‘AB curve’). 

The properties of the mutual information of the SNR-based 
enhanced approximation, /sdea, on 7; and yh are as follows: 

a) If 7i — 7/t ^ /sdea = ^SEB' 

b) If yh > > A^^, /sdea = /qb- 

c) If 7i < A^ and 7/, > A^^, /sdea = /sS- 

d) The BC curve monotonically decreases from /ggg to /gg 
as 7/1 increases from A^ to A^^. 

e) The AB curve can exceed /gg at low SNR. 

The proofs of the properties are provided in Appendix 
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J(z; d) A 


jup 

-^GB 

jup 

-^SEB 


^SD 


BC curve I" 
(iVA- 0 )' 


A 7 


Iab(i) 

v/f- 


I 


AB curve 
{Nc - 0) 




Xf Ji 7c Jh A7 


7 


(a) 


J(z; d) A 


jup 

-^SEB 


jup 

''GB 


^SD 




BC curves--- 
(Na - 0 ) 


A 7 


^ab(i) 


AB curve 
(Nc = 0 ) 


''■!77gb 


Af lllc Ih A7 


7 


(b) 


Fig. 5. Mutual information according to 7 ^ and 7 /^ at given SNR p: (a) 
p> pc (i.e., /g^B < -fp^) case (b) p < pc (i.e., /||b > /q^) case. The 
thick blue dashed curve is for fixing 7 ; —>■ 0 and varying 7/1 (‘BC curve’) and 
the thick red dot-and-dash curve is for fixing 7 ^ —> 00 and varying 7 ; (‘AB 
curve’). Iab(ii) — and /bc( 7 /i) ~ ^go correspond to approximation 
errors. 


Based on these properties, we next present a proposal how 
to determine 7 ; and jh at given average SNR. Let us define 
^7 — Ih—li- If A 7 > A^^ — A^, it results in Ig^ by setting 
such that Property c can be satisfied. Otherwise, we trade off 
accuracy versus complexity. As shown in Fig. |5l there exists 
an intersection point of the BC curve and the AB curve. Let 
us denote the intersection point on x-axis by 7 c, then the AB 
curve is less erroneous on its left-hand side and so is the 
BC curve on its right-hand side of 7 c. Hence, the best way 
is following the AB curve in the left-hand side and the BC 
curve in the right-hand side. We propose to determine 7 ; and 
7 /j proportionally to 7 c — Xf and A^^ — 7 c with the width A 7 , 

i.e., 7 ; = 7 c + 77\°^ ^7 and 7,1 = 7 c + A 7 . If it 

is hard to find 7 c due to computational complexity for high 
dimension, it can be determined by 7 c = ^ ^ ^7 

According to SNR region, there are two different cases with 
comparison between /pg and /ggg as shown in Fig.|5](a) and 
(b). In case of the error can make the mutual 

information exceed /pg if the width obtained by /pg and two 
curves (i.e., A 7 gb) is longer than A 7 . In this case, taking the 
GB is better than the SD approximation for given A 7 . Thus, 
two threshold values are set to 7 / = A^^ < 'jh for this case 
based on Property b. Actually, since the GB is very close to 
the true curve at low SNR, this setting is reasonable. This also 
can be done simply by limiting the mutual information of the 
enhanced approximation by /pg. 

In general, we can identify three SNR regions. In the low 
SNR regime, the Gaussian approximation performs well. In 
the high SNR regime, the one component only approximation 


performs well. Both do not perform well in the medium 
SNR regime where the more complex SD approximation 
yields good results. The previous discussion applies to a given 
average SNR. If we want to compute the entropy/mutual 
information for an average SNR range as depicted in Fig. 4, 
then in principle we have to compute the threshold values for 
every average SNR value. However, to reduce the complexity 
even further, we propose to compute the threshold values •yi 
and yh for the average SNR pc where the source entropy bound 
and Gaussian bound intersect and then scale the thresholds for 
each average SNR value p by i.e., 7 ;^ and yh^ are used 
as threshold values. 

VI. Numerical Examples 

In this section, we evaluate the proposed SD bounds and 
SNR-based enhanced approximations in terms of the mutual 
information and the complexity, compared to several bench¬ 
marks, which are briefly introduced in the following subsec¬ 
tion. We consider two kinds of channels for the performance 
comparisons: (i) finite impulse response (FIR) filter channel 
and (ii) frequency-selective and time-selective fading channel. 


A. Benchmarks 

1) Statistical Approximation (SA) Method Ml?.- The SA 
method is analogous to a combination of high and low 
SNR approximations in the proposed SNR-based enhanced 
approximation. That is, it finds the following two pdfs of the 
received symbol vector for high and low SNRs, respectively, 

= (42) 

where d denotes the drawn d in the Monte-Carlo expectation 
and Kz = pHH'^ -f I. Then, the pdf of z is approximated by 

/(z) Ri max{//,(z),/i(z)} . (43) 

2) BCJR Algorithm Based Computation Method The 

BCJR algorithm based computation method has been invented 
to compute information rates for finite-state channels. In this 
method, for given finite-state channel, the mutual information 
between v'ery long input and output sequences are defined as 

/(z;d) A-ilog 2 p( 2 ;”) -/r(z|d), (44) 

where n is the sequence length and z" = (zi, Z 2 ,..., z„) 
denotes the output sequence. Then, it finds p{z'^) based on 
the forward sum-product algorithm ED- By employing a 
state sequence Sg = (sg, si,..., s„) and denoting the input 
sequence d” = {di,d 2 ,. ■ ■, dn), p{z'^) can be computed by 

(45) 

d" sg 

Defining the state metric p.k{sk) — p{sk,z^) for the /c-th 
symbol, the computation of (1451) is possible by computing the 
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state metrics recursively as 

Mfe('Sfe) = EE ^lk-l{sk-l)p{dk,Zk,Sk\sk-l) (46) 

dk Sk-i 

= (47) 

wfc fc — 1 

for k = After all, (l45l l is obtained by = 

X/s„ k-n{Sn)- 

In order to reduce the computational complexity for chan¬ 
nels with a large number of states, gUl can be modified to 
yield a lower bound on p(z") by taking a subset of states at 
each k stage. Let S'^. be a subset of states at the fc-th stage 
with Q = The recursion (l46l l can be modified to 

E E p,k-iisk-i)p{dk, Zk, Sk\sk-i). (48) 

dk Sk^iGS'^_^ 

This yields an upper bound on h{z) and thus it is called 
reduced-state upper bound (RSUB) in 0 . It is worth noting 
that reducing the number of states is a similar approach to 
reducing the number of candidate input vectors in the proposed 
SD approximation. 

3) Hamming Distance 1 (HDl) Based Approximation 
Method: For the sake of performance comparison, we pro¬ 
pose an HDl-based approximation method which is a simple 
Gaussian mixture reduction including the symbol vectors with 
Hamming distance one from a pre-chosen symbol vector. Here, 
we use the Babai estimate for the pre-chosen symbol vector. 
Hence, based on the Babai estimate dp = [do,i! • ■ ■ 
the candidate symbol vectors are obtained by 

= Mo.l, • ■ • , rfo,i+l, ■ • ■ , rfo.tVt]"'', (49) 

where e A^c\{do.i}, i = 1,. ■., A^i, j = 1, •. •, \Mc\- 1- 
Consequently, we obtain the set of symbol vectors to be added 
up by Dhdi = {do} thus, the following pdf 

is obtained: 

/m(z) = 

d£l5HDl 

Since fm{^) is good only at medium SNR, by combining high 
and low SNR approximations in the SA method, the pdf of z 
can be approximated by 

/(z) « max{/?,(z),/„(z),/i(z)} . (51) 

B. FIR Filter Channel 

As first example, we consider a memory-10 FIR filter chan¬ 
nel with i.u.d. binary input used in 0 as the largest memory 
case, i.e., Zk = Y.]lo9idk-i + rik, where gi = For 

convenience in SNR calculation, the sum of squared channel 
coefficients is normalized by one. In matrix representation, 
this channel can be constructed by a Toeplitz matrix with 
= 11 where each row has the same elements but circularly 
shifted (i.e., circulant matrix). Unlike real-valued noise was 
considered in 0 , we consider complex-valued noise. 

Fig.| 6 ](a) shows the mutual information for memory-10 FIR 
filter channel with binary input. The GB drawn with Gaussian 




Fig. 6. Memory-10 FIR filter channel with binary input (a) Mutual infor¬ 
mation [bits/symbol] (b) Complexity in terms of the number of visited nodes 
during the tree search or states during the trellis search. For Monte-Carlo 
expectation, we use = 100 and Nz = 50, and for the BCJR method, we 
set n = 5 X 10^. 


distributed input provides an upper bound. The true curve can 
be found by the SD tree search with infinite sphere radius (i.e., 
7 dfs ^ method is the worst and the HDl 

method is better than the SA method at medium SNR. The 
BCJR method with full trellis and the proposed DFS-based 
SD upper bound with a = 1.5 provide the true curve. The 
DFS-based SD upper bound with a = 1, the BFS-based SD 
upper bounds, and the BCJR-based RSUB with Q = 100 yield 
some errors as SNR decreases. For larger a and K parameters, 
the SD upper bounds become more accurate. 

Fig. |6] (b) shows the complexity in terms of the number 
of visited nodes during the tree search or states during the 
trellis search. The HDl method requires to find Nt{Mc — 1) 
neighbor components. The number of visited nodes in full tree 
search for the true curve is given by J2k=i ^c- The number of 
visited states in the BCJR method with full trellis at the Af^-th 
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Fig. 7. SD upper bound lower bound (I^), and enhanced lower bound 

for DFS (upper figure) and BFS (lower figure) at SNR = —2.5 dB 
in memory-10 FIR filter channel with binary input. 

stag^ is given by Me ^ ^c- Unlike the BFS-based SD 
upper bound and the BCJR method, the DFS-based SD upper 
bounds show variable complexities according to SNR due to 
fixed sphere radius, i.e., they result in higher complexity at low 
SNR. The complexity of the BFS-based SD upper bound is 
given in (l26l t and that of the BCJR-based RSUB is obtained by 
+ Efclgo+I ‘S) where qo = max{fc : < 

Q}. The BFS with K = 50 has lower complexity than the 
BCJR-based RSUB with Q = 100, while it is much more 
accurate on the mutual information as shown in Fig. | 6 ](a). 
Moreover, the BFS with K = 50 is more accurate than the 
DFS with a = 1, while it has much lower complexity when 
SNR < 4 dB. Thus, the BFS is useful for low-complexity 
with a reasonable accuracy. 

Fig. |7] shows trends of the SD bounds according to control 
parameters (i.e., a for DFS and K for BFS) in memory-10 FIR 
filter channel with binary input at SNR = —2.5 dB. As the 
parameters increases, the bounds converge to the true curve. 
Note that for both DFS and BFS cases, the upper bounds are 
much tighter than the lower and enhanced lower bounds. 
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Fig. 8 . Frequency-selective and time-selective channel with 4-QAM input, 
Nt = 8 , and L = JVt — 1 (a) Mutual information [bits/symbol] (b) 
Complexity in terms of average number of visited nodes during the tree 
search. For Monte-Carlo expectation, we use = 50 and Nz = 50. 
For the enhanced approximation, we use 0 = 2,10 logj^g 7 ; = —4 dB, 
101 ogjQ 7 ^ = 4 dB at Pc = 0 dB. Note that in the randomly realized 
channel H, lOlogj^g Aj = —3.51 dB and lOlogj^g A^^ = —3.89 dB. 


C. Frequency-Selective and Time-Selective Fading Channel 
with a Large Memory 

As second example, we consider a generalized frequency- 
and time-selective fading channel given by H = AG where 
A is the diagonal time-selective channel matrix and G is the 
frequency-selective circulant matrix as in ll^ . This channel 
setup is relevant for realistic WCDMA systems 0. For 
A = diag(ai,..., OTVt), we assume Oi ~ CA/’(0, 1)V*. For 
G, we consider a memory-T FIR filter channel, i.e., Zk = 
Z)f=o 9idk-i + nk, where gi = 2 “', / = 0 ,..., i. Note that 
the BCJR method in JT) does not work for these setups due 
to time-varying property. Thus, we only take into account the 
SA and HDl methods as benchmarks in this channel. 

^For fair comparison, we consider the complexity corresponding to first Nt 
symbols for the BCJR method since the SD bounds has the block length Nt- 


Fig. [ 8 ] shows the mutual information and complexity for 
frequency- and time-selective channel with 4-QAM input, 
Nt = 8 , and L = Nt — 1. The DFS-based upper bound is 
almost the same as the true curve with much lower complexity, 
while the BFS-based upper bound has small errors at low SNR 
due to lowering the complexity. As investigated in Fig. |2l the 
upper bounds are much tighter than the lower bounds. The 
DFS-based lower bound has approximately a constant gap 
with the upper bound, whereas the BFS-based lower bound 
is tight at high SNR but loose at low SNR. The enhanced 
approximation approaches the true curve with a small gap but 
much lower complexity at low and high SNRs, while the SA 
and HDl methods have large errors at moderate SNR. 

Fig. |9] shows the mutual information for frequency- and 
time-selective channel with 4-QAM input, Nt = 40, and L = 
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2.2 


Appendix A 

Proofs of Properties on 7 ; and 7/1 
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a) Property a corresponds to the case of Nq = Nt- Thus, we 
show that the mutual information of the Bahai estimate- 
based approximation with Nq = Nt is equivalent to 
IggB- Instead of (l30l) . the effective received signal be¬ 
comes V Ri Rd + w. Applying the single component 
only approximation, the pdf of v is given by 

exp(-||v-Rd||2) ^ g^p(_||^|| 2 ^ 

Then, the mutual information is derived as 


Fig. 9. Mutual information in frequency-selective and time-selective channel 
with 4-QAM input, Nt = 40, and L = Nf — l. For Monte-Carlo expectation, 
we Nd = 50 and Nz = 50. For the enhanced approximation, we use q = 1, 
10 log]^Q 'yi = 3 dB, 10 log^^g 7 /i = 5 dB at pc = 0 dB. 


Nt — l. Computing the true curve is impossible due to the 
huge problem size, i.e., ri 1.2 x 10^^. The SD 

bounds are also unavailable within reasonable simulation time. 
Therefore, we compare the enhanced approximation with the 
SA and HDl methods. Both the SA and HDl methods almost 
reach two trivial upper bounds, i.e., GB and SEB, for this 
large size case, while the enhanced approximation still yields 
a nice curve below. Note that from the properties given in 
Section rV-BI and Fig. we can conjecture that the true curve 
lies below the enhanced approximation. The complexity of 
the enhanced approximation is about 10^ at SNR = 4 dB and 
less than 10^ in the other SNRs, while the complexity of full 
tree search for the true curve is 4(4'*° — l)/3 Ri 1.6 x 10^*. 
Compared Fig. |9] to Fig. (a), for a large block size, the 
mutual information is decreased in overall but in general, it 
will depend on the channel realization. 


/(z;d) r;E [^log2(7rMc)^‘ -flog 2 ell'll' J -log2(7re) 
= log2 Mf* + log2 - log2 e^‘ = 


SEB 


since E [||w||2] = Nt and = H{d) = log2M7‘.B 

b) Property b corresponds to the case of N^ = Nt- Thus, we 
show that the mutual information of the single Gaussian 
approximation with Na = Nt is equivalent to First 
of all, /qb = log 2 det(pRR'* -I- I) since E [vv^] = 
E (Rd -b w) (Rd + w)'^ = pRR'* + I. Applying the 
single Gaussian approximation, the pdf of v is 


/(v) 


exp(—*v 


tN 


det(Kv 


where Kv = pRR*^ -f I since A = R. Hence, the mutual 
information is derived as 


VII. Conclusion 

We have proposed novel complexity efficient algorithmic 
solutions to approximate the entropy of Gaussian mixture dis¬ 
tributions with a large number of components. The algorithms 
allow to trade-off the accuracy versus the complexity and 
the approximations are asymptotically tight with unbounded 
complexity. The extended approach can even deal with very 
high system dimensions with a reasonable accuracy which was 
not possible previously. The computation of the entropy for 
Gaussian mixture distribution is important for many problems, 
e.g. data fusion, machine learning, etc. In particular, it can 
be used to approximate the mutual information of a vector¬ 
valued Gaussian channel with finite input alphabets. In contrast 
to other methods, the proposed algorithms are applicable to 
any linear input output relation. The proposed concepts can 
be easily adapted or extended to other application areas. 
For future work, the concept and methods developed in this 
work can be extended to deal with more general Gaussian 
mixture distributions with heterogeneous covariance structures 
including improper complex signals. 


/(z; d) R! E log2 TT^* + log2 det(Kv) 

-b log2exp ((Kv ^v)^Kv 
= log2 det(K, 


-log2(7re) 


Nt 


_ TUp 

— -'GBi 


since Kv is Hermitian and Kv ^ CA/'(0,I). ■ 

c) The proof of Property c is straightforward since this 

corresponds to the case of Nb = N- ■ 

d) Since Na = 0 and Nb + Nq = Nt for the BC curve, 
/(v) = /(vc)/(v_b|vc) from dSTll, where 


/(vc) 


exp(-||wcf) 


/(vb|vc) Ri 



m—1 


exp(-||B(dB - dB,m) + wbIP) 


where dB,m is the m-th vector in . Denoting eB,m — 
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B{dB — dB,m), the mutual information is derived as 


/(z;d) a; E 


log2(7rMc)^^ +log2ell"'°ll 

log2 6-11®^''"+'^^"' - 

m—1 


+ \0g^{TTM,f^ 

log2(7re)^* 


(J 


log 2 + log 2 det(pAA'^ + I) 

|X)|°| 


E 


gl|wB ||^-||eB,m+WB ||^ + ||wa||^ 



m—1 


l 0 g 2 

.Ka'va, 


=y 


log 2 - E 


|I5|°| 


l 0 g 2 


3-l|eB,m+WB|| 


+ log 2 - log 2 

log 2 ^c^‘ -E[log 2 


||wsH -||eB,Tn+WB|| 




m—1 




(A.l) 


where (a) comes from E[||w|p] = E[||wb|P + ||wc|P] = 
Nt. If A > 1, the second term of (lA.lb becomes non¬ 
positive and therefore, /(z;d) < always holds. 

If ds e V^, A includes exp(||wB||^ - \\eB,m + 
wslp) = 1 and thus, A > 1. If d^ ^ vectors in 

yields shorter Euclidean distances than d^- That is, 
for all dB,fn G T’|°, exp(||wB|p - ||eB,m + WB|p) > 1 
since ||wBp > ||eB,m + w_b|P. As a result, A > 
> 1. Therefore, /(z;d) < holds. 

If 7 /i < Af then /(z;d) = by Property a. In 

addition, if 7 ^ > then /(z;d) = Jg^ by Property c. 
As 7 /j increases from X\, Nb becomes non-zero and A 
has exponential terms. For further increasing 7 ^, 

if As increases by one then A has Me additional expo¬ 
nential terms. Since exp(-) > 0, A gradually increases as 
')h increases. Therefore, /(z;d) monotonically decreases 
from /gEB to IgB as 7 ^, increases. ■ 

e) Since Nc = 0 and Na + Nb = Nt for the AB curve, 
/(v) ~ Em=fl'p(dB.m)/(vB|dB,m)/(vA|dB.m) in 
(O. Denoting eB,m = B(dB - d^.m) and XA,m = 
Vai - BAdB,m = Adyi -f B, 4 (dB - dB,m) + WA, the 
pdfs are written by p(dR m) = — 

’ Me ® 


f{^B\dB,7n) 

/(vA|dB,m) 


exp(-||eR,m -I-wr|P) 

■jj-Nb 

exp(-v!4,^K;^^VA,m) 

det(KAi) 


where K^i = pAA'^ + I. Then, the mutual information 
is derived as 


1 ^® ' 

/(z;d) Ri -E|^ ^ p{dB,m)fi'^B\dB,m)fi'yA\dB,m) 

m—1 

- log2(7re)"‘ 

log2 ® -f log2 7 r^-^+^® -f log2 det(K4) 
|I5|®| 

-log2 e 


E 


e-l|eB,m+WB II^-Va Aa,. 


m—1 

-N, 


- log2 TT ‘ - E log2 e" 


where (^) comes from E[||w|p] = Nt, and (6) 

comes from ||w|p = ||wai|P + ||wr|P. As p 0, 
log2 det(pAA'^ +1) Ri 0 and we have 


3^ 


\v%°\ 

g||wB||^-||eB,m-|-WB||^ + l|wA||^-l|wA||^ 

m—1 



w 

A > 1 , 


where {(j)) comes from the proof of Property d. Therefore, 
for p —>• 0, we have /(z; d) = log 2 M^® — E[log 2 A] > 
log2Miv® - log2E2i'e^'"'"^"'’ = - 

(log 2 IT’r^I + Nb log 2 e) > —Nb log 2 e, which can be 
positive, while = log 2 det(pRR'^+I) ~ 0 as p —>■ 0. 
Therefore, the AB curve can exceed Igg at low SNR. ■ 
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